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As a key step towards a complete automation of the finite element method, we present a new 
algorithm for automatic and efficient evaluation of multilinear variational forms. The algorithm 
has been implemented in the form of a compiler, the FEniCS Form Compiler FFC. We present 
benchmark results for a series of standard variational forms, including the incompressible Navier- 
Stokes equations and linear elasticity. The speedup compared to the standard quadrature-based 
approach is impressive; in some cases the speedup is as large as a factor 1000. 
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1. INTRODUCTION 

The finite element metliod provides a general mathematical framework for the so- 
lution of differential equations and can be viewed as a machine that automates the 
discretization of differential equations; given the variational formulation of a differ- 
ential equation, the finite element method generates a discrete system of equations 
for the approximate solution. 

This generality of the finite element method is seldom reflected in codes, which 
are often very specialized and can only solve one particular differential equation or 
a small set of differential equations. 

There are two major reasons that the finite element method has yet to be fully 
automated; the first is the complexity of the task itself, and the second is that 
specialized codes often outperform general codes. We address both these concerns 
in this paper. 

A basic task of the finite element method is the computation of the element 
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stiffness matrix from a bilinear form on a local element. In many applications, 
computation of element stiffness matrices accounts for a substantial part of the 
total run-time of the code. [Kirby et al. 2005, SISC] This routine is a small amount 
of code, but it can be tedious to get it both correct and efficient. While the standard 
quadrature-based approach to computing the element stiffness matrix works on very 
general variational forms, it is well-known that precomputing certain quantities in 
multilinear forms can improve the efficiency of building finite element matrices. 

The methods discussed in this paper for efficient computation of element stiffness 
matrices are based on ideas previously presented in [Kirby et al. 2005, SISC] and 
[Kirby et al. 2005, BIT], where the basic idea is to represent the element stiffness 
matrix as a tensor product. A similar approach has been implemented earlier in the 
finite element library DOLFIN [Hoffman et al. 2005; Hoffman and Logg 2002], but 
only for linear elements. The current paper generalizes and formalizes these ideas 
and presents an algorithm for generation of the tensor representation of element 
stiffness matrices and for evaluation of the tensor product. This algorithm has 
been implemented in the form of the compiler FFC for variational forms; the com- 
piler takes as input a variational form in mathematical notation and automatically 
generates efficient code (C or C-I--I-) for computation of element stiffness matrices 
and their insertion into a global sparse matrix. This includes the generation of code 
both for the computation of element stiffness matrices and local-to-global mappings 
of degrees of freedom. 

1.1 FEniCS and the Automation of CMM 

FFC, the FEniCS Form Compiler, is a central component of FEniCS [Hoffman 
et al. 2005], a project for the Automation of Computational Mathematical Modeling 
(ACMM). The central task of ACMM, as formulated in [Logg 2004], is to create 
a machine that takes as input a model R{u) = A{u) — /, a tolerance TOL > 
and a norm || • || (or some other measure of quality), and produces as output an 
approximate solution U ~ u that satisfies the accuracy requirement )|J7 — u|| < TOL 
using a minimal amount of work (see Figure 1). This includes an aspect of reliability 
(the produced solution should satisfy the accuracy requirement) and an aspect of 
efficiency (the solution should be obtained with minimal work). 



R(u) = 



TOL > 



Fig. 1. The Automation of Computational Mathematical Modeling. 

In many applications, several competing models are under consideration, and one 
would like to computationally compare them. Developing separate, special purpose 
codes for each model is prohibitive. Hence, a key step of ACMM is the automation 
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of discretization, i.e., the automatic translation of a differential equation into a 
discrete system of equations, and as noted above this key step is automated by the 
finite element method. The FEniCS Form Compiler FFC may then be viewed as 
an important step towards the automation of the finite element method, and thus 
as an important step towards the Automation of CMM. 

FEniCS software is free software. In particular, FFC is licensed under the GNU 
General Public License [Free Software Foundation 1991]. All FEniCS software is 
available for download on the FEniCS web site [Hoffman et al. 2005]. In Section 5.6, 
we return to a discussion of the different components of FEniCS and their relation 
to FFC. 

1.2 Current finite element software 

Several emerging projects seek to automate important aspects of the finite element 
method. By developing libraries in existing languages or new domain-specific lan- 
guages, software tools may be built that allow programmers to define variational 
forms and other parts of a finite element method with succinct, mathematical syn- 
tax. Existing C-I-+ libraries for finite elements include DOLFIN [Hoffman et al. 
2005; Hoffman and Logg 2002], Sundance [Long 2003], deal.II [Bangerth et al. 
2005], Diffpack [Langtangen 1999] and FEMSTER [Castillo et al. 2004; Castillo 
et al. 2005]. Projects developing domain-specific languages for finite element com- 
putation include FreeFEM [Pironneau et al. 2005] and GotDP [Dular and Geuzaine 
2005]. A precursor to the FEniCS project, Analysa [Bagheri and Scott 2003], was 
a Scheme-like language for finite element methods. Earlier work on object-oriented 
frameworks for finite element computation include [Mackie 1992] and [Masters et al. 
1997]. 

While these tools are effective at exploiting modern software engineering to pro- 
duce workable systems, we believe that additional mathematical insight will lead 
to even more powerful codes with more general approximating spaces and more 
powerful algorithms. The FEniCS project is more ambitious than to just collect 
and implement existing ideas. 

1.3 Design goals 

The primary design goal for FFC is to accept as input "any" multilinear variational 
form and "any" finite element, and to generate code that will run with close to 
optimal performance. 

We will make precise below in Section 3.2 which forms and which elements the 
compiler can currently handle (general multilinear variational forms with coeffi- 
cients over affine simplices). 

A secondary goal for FFC is to create a new standard in form evaluation; hope- 
fully FFC can become a standard tool for practitioners solving partial differential 
equations using the finite element method. In addition to generating very efficient 
code for evaluation of the element stiffness matrix, FFC thus takes away the burden 
of having to implement the code from the developer. Furthermore, if the code for 
the element stiffness matrix is generated by a compiler that is trusted and has gone 
through rigorous testing, it is easier to achieve correctness of a simulation code. 

The primary output target of FFC is the C++ library DOLFIN. By default, 
FFC accepts as input a variational form and generates code for the evaluation of 
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the variational form in DOLFIN, as illustrated in Figure 2. Although FFC works 
closely with other FEniCS components, such as DOLFIN, it has an abstraction 
layer that allows it to be hooked up to multiple backends. One example of this is 
the newly added ASE (ANL SIDL Environment, [Balay et al. 2005a]) format added 
to FFC, allowing forms to be compiled for the next generation of PETSc [Balay 
et al. 2005b]. 



a{v, u) = \7v{x) ■ \7u{x) dx 




FFC 




Poisson . h 


> 


— > 



Fig. 2. The form compiler FFC takes as input a variational form and generates code for evaluation 
of the form. 



1.4 The compiler approach 

It is widely accepted in developing software for scientific computing that there is 
a trade-off between generality and efficiency; a software component that is general 
in nature, i.e., it accepts a wide range of inputs, is often less efficient than another 
software component that performs the same job on a more limited set of inputs. As 
a result, most codes used by practitioners for the solution of differential equations 
are very specific. 

However, by using a compiler approach, it is possible to combine generality and 
efficiency without loss of generality and without loss of efficiency. This is possi- 
ble since our compiler works on a very small family of inputs (multilinear varia- 
tional forms) with sharply defined mathematical properties. Our domain-specific 
knowledge allows us to generate much better code than if we used general-purpose 
compiler techniques. 

1.5 Outline of this paper 

Before presenting the main algorithm, we give a short background on the imple- 
mentation of the finite element method and the evaluation of variational forms in 
Section 2. The main algorithm is then outlined in Section 3. In Section 4, we 
compare the complexity of form evaluation for the algorithm used by FFC with the 
standard quadrature-based approach. We then discuss the implementation of the 
form compiler in some detail in Section 5. 

In Section 6, we compare the CPU time for evaluating a series of standard 
variational forms using code automatically generated by FFC and hand-coded 
quadrature-based implementations. The speedup is in all cases significant, in the 
case of cubic Lagrange elements on tetrahedra a factor 100 (Figure 3). 

Finally, in Section 7, we summarize the current features and shortcomings of 
FFC and give directions for future development and research. 

2. BACKGROUND 

In this section, we present a quick background on the finite element method. The 
material is standard [Ciarlet 1976; Hughes 1987; Brenner and Scott 1994; Eriksson 
et al. 1996], but is included here to give a context for the presentation of the form 
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Fig. 3. Speedup for a series of standard variational forms (here compiled for cubic Lagrange 
elements on triangles and tetraliedra, respectively). 



compiler and to summarize the notation used throughout the remainder of this 
paper. 

For simphcity, we consider here only linear partial differential equations and 
note that these play an important role in the discretization of nonlinear partial 
differential equations (in Newton or fixed-point iterations). 

2.1 Variational forms 

We work with the standard variational formulation of a partial differential equation: 
Find u ^ V such that 



a{v, u) = L{v) \/v 6 V, 



(1) 



with a : F X V — )• ffi. a bilinear form, L : V M. a, linear form, and {V, V) 
a pair of suitable function spaces. For the standard example, Poisson's equation 
—Au{x) — ,f{x) with homogeneous Dirichlet conditions on a domain $7, the bilinear 
form a is given by a{v, u) — J^^ Vw(x) • Vw(a;) da:, the linear form L is given by 
L{v) = J^v{x)f{x)dx, andV = V = H^in). 

The finite element method discretizes (1) by replacing {V , V) with a pair of 
(piecewise polynomial) discrete function spaces. With {4>i}fLi a basis for the test 
space V and {4>i}fii a basis for the trial space V, we can expand the approximate 
solution U of (1) in the basis functions of V, U = X^jli G'/'ji obtain a linear 
system — b for the degrees of freedom {£,j}jLi of the approximate solution U. 
The entries of the matrix A and the vector b defining the linear system are given 
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by 

Aij =a{4>i,(t>j), i,j = l,...,M, 
bi = L{$i), i = l,...,M. 

2.2 Assembly 

The standard algorithm for computing the matrix A (or the vector b) is assembly; 
the matrix is computed by iteration over the elements -ftT of a triangulation T of fi, 
and the contribution from each local element is added to the global matrix A. 

To see this, we note that if the bilinear form a is expressed as an integral over 
the domain il, we can write the bilinear form as a sum of element bilinear forms, 
a{v,u) = Xlifgr '^^(^''")' '^^^ 

Aij = ^ aK{(i>i,(l)j), i,j = ^,---,M. (3) 

In the case of Poisson's equation, the element bilinear form ax is defined by 

aK^v, u) = Jj^ Vu(x) • 'Vu{x) dx. 

Let now {(/>f' be the restriction to K of the subset of {<j)i}f'Li supported on 
K and {^f^j-L^ the corresponding local basis for V. Furthermore, let be a 
mapping from the local numbering scheme to the global numbering scheme (local- 
to-global mapping) for the basis functions of V, so that is the restriction to K 
of (t>t{K,i)^ and let i{K, •) be the corresponding mapping for V. We may now express 
an algorithm for the computation of the matrix A (Algorithm 1). 



Algorithm 1 A = Assemble(a, T, V, V) 
A = 
for K gT 

for i = 1, . . . ,n 

for j = l,...,n 

At(K,i)iiK,j) = A^K,i)i{K,j) + axiw ,<i>f) 
end for 
end for 
end for 



Alternatively, one may define the element matrix A^ by 

Af^ = aK{^f ,<l>f) i,i = l,...,n, (4) 

and separate the computation on each element K into two steps; computation of 
the element matrix A^ and insertion of A^ into A (Algorithm 2). 

Separating the two concerns of computing the element matrix A^ and adding it 
to the global matrix A as in Algorithm 2 has the; advantage that one may use an 
optimized library routine for adding the element matrix A^ to the global matrix A. 
Sparse matrix libraries such as PETSc [Balay et al. 2005b; Balay et al. 2004; Balay 
et al. 1997] often provide optimized routines for this type of operation. Note that 

ACM Transactions on Mathematical Software, Vol. V, No. N, Month 20YY. 



A Compiler for Variational Forms • 7 



Algorithm 2 A = Assemble(a, T, V, V) 
A = 
for K gT 

Compute according to (4) 

Add A^ to A using the local-to-global mappings {l{K, ■),l{K, •)) 
end for 



the cost of adding A^ to A may be substantial even with an efficient implementation 
of the sparse data structure for A [Kir by et al. 2005, SISC]. 

As we shall see below, we may also take advantage of the separation of concerns 
of Algorithm 2 to optimize the computation of the clement matrix A^ . This step 
is automated by the form compiler FFC. Given a bilinear (or multilinear) form a, 
FFC automatically generates code for run-time computation of the element matrix 
A^. 

3. EVALUATION OF MULTILINEAR FORMS 

In this section, we present the algorithm used by FFC to automatically generate 
efficient code for run-time computation of the element matrix A^ . 

3.1 Multilinear forms 

Let {Vj}^^! be a given set of discrete function spaces defined on a triangulation T 
of O C M**. We consider a general multilinear form a defined on the product space 

Vi X V2 X ■ ■ ■ X Vr-. 

a:VixV2X ■■■ xVr^R. (5) 

Typically, r = 1 (linear form) or r = 2 (bilinear form), but the form compiler FFC 
can handle multilinear forms of arbitrary arity r. Forms of higher arity appear 
frequently in applications and include variable coefficient diffusion and advection 
of momentum in the incompressible Navier-Stokes equations. 

Let now {</>|h=\, • • • , Mli^i be bases of V,,V2,...,Vr and let i = 

(ii, i2, . . . , ir) be a multiindex. The multilinear form a then defines a rank r tensor 
given by 

A=ai<Pl^,^^,...,cK^). (6) 

In the case of a bilinear form, the tensor A is a matrix (the stiffness matrix), and 
in the case of a linear form, the tensor A is a vector (the load vector). 

As discussed in the previous section, to compute the tensor A by assembly, we 
need to compute the element tensor A^ on each element K of the triangulation 
T of fi. Let {<5!'f^'^}"ii be the restriction to K of the subset of supported 
on K and define the local bases on K for V2,. ■ . ,Vr similarly. The rank r element 
tensor A^ is then defined by 

Af = a^{^f;'\cp^'\...,<pl-)- (7) 

3.2 Evaluation by tensor representation 

The element tensor A^ can be efficiently computed by representing A^ as a special 
tensor product. Under some mild assumptions which we shall make precise below, 
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the element tensor can be represented as the tensor product of a reference 
tensor A° and a geometry tensor Gk- 

Af = AlG%, (8) 

or more generally a sum Af = A^^C^ of such tensor products, where i and 

a are multiindiccs and we use the convention that repetition of an index means 
summation over that index. The rank of the reference tensor is the sum of the 
rank r ~ \i\ of the element tensor and the rank |q;| of the geometry tensor Gk- As 
we shall sec, the rank of the geometry tensor depends on the specific form and is 
typically a function of the Jacobian of the mapping from the reference element and 
any variable coefficients appearing in the form. 

Our goal is to develop an algorithm that converts an abstract representation of a 
multilinear form into (i) the vahics of the reference tensor A'^ and (ii) an expression 
for evaluating the geometry tensor Gk for any given element K. Note that A'^ is 
fixed and independent of the element K and may thus be precomputed. Only Gk 
has to be computed on each clement. As we shall see below in Section 4, for a 
wide range of multilinear forms, this allows for computation of the element tensor 
A^ using far fewer floating-point operations than if the element tensor A^ were 
computed by quadrature on each element K. 

To sec how to obtain the tensor representation (8), we fix a small set of operations, 
allowing only multilinear forms that can be expressed through these operations, and 
observe how the tensor representation (8) transforms under these operations. As 
we shall see below, this covers a wide range of multilinear forms (but not all). We 
first develop the tensor representation in abstract form and then present a number 
of test cases that exemplify the general notation in Section 3.3. 

As basic elements, we take the local basis functions {(A^j-y = ^i{4'f'^}^Li 
a set of finite element spaces Vi, i = 1,2,..., including the finite element spaces 
Vi,V2, ■ . ■ ,Vr on which the multilinear form is defined. Allowing addition (pi + (j>2 
and multiplication with scalars acf), wc obtain a vector space A of linear combina- 
tions of basis functions. Since 0i — 02 = + (—1)^2 and (j)/a = (l/a)0, we can 
easily equip the vector space with subtraction and division by scalars. 

We next equip our vector space with multiplication between elements of the 
vector space. We thus obtain an algebra A of linear combinations of products of 
basis functions. Finally, we extend A by adding diflferentiation d/dxi with respect 
to a coordinate direction a;,, i = 1, . . . , d, on K, to obtain 

where (•) represents some multiindex. 

To summarize, A is the algebra of linear combinations of products of basis func- 
tions or derivatives of basis fini(;tions that is generated from the set of basis functions 
through addition (-h), subtraction (— ), multiplication (•), including multiplication 
with scalars, division by scalars (/), and differentiation d/dxi. Note that if the 
basis functions are vector- valued (or tensor- valued), the algebra is generated from 
the set of scalar components of the basis functions. 

We may now state precisely the multilinear forms that the form compiler FFC 
can handle, namely those multilinear forms that can be expressed as integrals over 
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K (or the boundary oi K) of elements of the algebra A. Note that not all integrals 
over K of elements of A are multilinear forms; in particular, each product needs to 
be linear in each argument of the form. 

The tensor representation (8) now follows by a standard change of variables using 
an afRne mapping Fk ■ Kq — >■ K from a reference element Kq to the current element 
K (sec Figure 4). With {^^j^ the basis functions on the reference clement corre- 
sponding to {07)7, defined by $7 = (pjoFx, wc obtain the following representation 

of the element tensor corresponding to Vi = (j2 C( ) 11 • 

JK 



(10) 



(•) 



where 



<^ = (/«.n^^d.Y)___, ,11) 

Gl,, =(c,.,<ietfinS^)^- (12) 

We have here used the fact that the mapping is affine to pull the determinant 
and transforms of derivatives outside of the integral. For a discussion of non-affine 
mappings, including the Piola transform and isoparametric mapping, see Section 3.4 
below. 

Note that the expression for the geometry tensor GK,k implicitly contains a sum- 
mation if an index is repeated twice. Also note that the geometry tensor contains 
any variable coefficients appearing in the form. 

As we shall see below in Section 5, the representation of a multilinear form as an 
integral over K of an element of A is automatically available to the form compiler 
FFC, since a multilinear form must be specified using the basic operations that 
generate A. 

3.3 Test cases 

To make these ideas concrete, we write down the explicit tensor representation (8) 
of the element tensor A^ for a series of standard forms. We return to these test 
cases below in Section 6 when we present benchmark results for each test case. 

Test case 1: the mass matrix. As a first simple example, we consider the compu- 
tation of the mass matrix M with Mj^jj = a{(f)\^ , and the bilinear form a given 

by 

a{v,u)= I v{x)u{x)dx. (13) 
in 
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By a change of variables given by the affine mapping Fk ■ Kq K, we obtain 

Af=[ <i>f^^\x)<i>l^\x)Ax = AetF'^ [ <^l{X)'^l{X)dX = A^,GK, (14) 
Jk Jko 

where = (X) dX and Gk = detF^. In this case, the reference 

tensor is a rank two tensor (a matrix) and the geometry tensor Gk is a rank zero 
tensor (a scalar). By precomputing the reference tensor A", we may thus compute 
the element tensor A^ on each element K by just multiplying the precomputed 
reference tensor with the determinant of (the derivative of) the affine mapping Fk- 

Test case 2: Poisson's equation. As a second example, consider the bilinear form 
for Poisson's equation, 

a{v,u) = / S/v{x) ■S/u{x)dx. (15) 
Jn 

By a change of variables as above, we obtain the following representation of the 
element tensor A^: 



Af= I V<i>l^\x).V<i>l^\x)dx 

JK 

7. 



U^' (16) 

dxp dX0 Jko 9Xai dXa2 
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where A? = ^^^'^^^^ dX and = det F' We sec that the 

reference tensor A" is here a rank four tensor and the geometry tensor Gk is a rank 
two tensor (one index for each derivative appearing in the form). 

Test case 3: Navier-Stokes. Consider now the nonhnear term u ■ Vu of the in- 
compressible Navier-Stokes equations, 

u + u- Vu — vAu + Vp = /, 

17) 

To solve the Navicr Stokes equations by fixed-point iteration (see for example 
[Eriksson et al. 2003]), we write the nonlinear term in the form u ■ Vu = w ■ Vu 
with w = u and consider w as fixed. We then obtain the following bilinear form: 



a{v,u) = aw{v,u) = / v{x) ■ {'w{x) ■ 'V)u{x) dx. (18) 
Jn 

Note that we may alternatively think of this as a trilinear form, a = a{v, u, w). 

Let now {w^}a be the expansion coefficients for w in a finite clement basis on the 
current element K, and let v[i] denote component i of a vector-valued function v. 
Furthermore, assume that u and w are discretized using the same discrete space 
V = V2. We then obtain the following representation of the element tensor A^: 



Af=j <i>f,^\x)-{w{x)-V)cl>l^\x)dx 

JK 



^ 2 (19) 

= detf;,^< J^^ $i^[,](xX[«,](X)^!|gP dX = AlCk, 

where = J^^^ ^}^W]{X)<i>lja,]{x f%iJ^''^ dX and G% = dct Fi^^w^. In 

this case, the reference tensor A'-^ is a rank five tensor and the geometry tensor Gk 
is a rank three tensor (one index for the derivative, one for the function w, and one 
for the scalar product). 

Test case 4-- linear elasticity. Finally, consider the strain-strain term of linear 
elasticity. 



a{v,u)= / \{Vv + {Vv)^):{Vu + {VuY)Ax 
Jn 4 



1 dvi dui ^ 1 dvi duj ^ 1 dvj dui ^ 1 dvj duj ^ 
n 4 dxj dxj 4 dxj dxi 4 dxi dxj 4 dxi dxi 



(20) 



Considering here only the first term, a change of variables leads to the following 
representation of the element tensor A^'^: 

ia <-^/3,](x) d<j)f/[M ix) 



j^K,i _ I 1 c/y,^ ipi\y.^) u^J^^ da: 

1 ^ , w dx^, dx^, f ^<^llfi^]ix) a$f^[/3,](x) 
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where A";^ = L '-^^^^^'-^h^ dX and G% , ^ IdetF^,^^. Here, 

the reference tensor A"'^ is a rank four tensor and the geometry tensor Gk,i is a 
rank two tensor (one index for each derivative). 

3.4 Extensions 

The current implementation of FFC supports only afhncly mapped Lagrange ele- 
ments and linear problems, but it is interesting to consider the generalization of 
our approach to other kinds of function spaces such as Raviart-Thomas [Raviart 
and Thomas 1977] elements for if(div) and curvilinear mappings such as arise with 
isoparametric elements, as well as how nonlinear problems may also be automated. 

3.4.1 i?(div) and i7(curl) conforming elements. Implementing of H{6xv) or 
iJ(curl) elements requires two kinds of generalizations to FFC. First of all, the basis 
functions are mapped from the reference element by the Piola transform [Brezzi and 
Fortin 1991] rather than the standard change of coordinates. With Fk : Kq ^ K 
the standard affinc mapping for an clement K , F'j^ the Frcchct derivative of the 
mapping and dct_F^ its determinant, the Piola mapping is defined by Fk{^) = 
^^F'k{^ o{FKy'). Since our tools already track Jacobians and determinants 
for differentiating through affine mappings, it should be straightforward to support 
the Piola mapping. Second, defining the mapping between local and global degrees 
of freedom becomes more complicated, as we must keep track of directions of vector 
components as done in FEMSTER [Castillo et al. 2004; Castillo et al. 2005]. 

As an example of using the Piola transform, we consider the Raviart-Thomas 
elements with the standard (vector-valued) nodal basis on the reference 

element. We compute the mass matrix M with Mj^i^ = a(V'ii , ■^i'ia) and the bilinear 
form a given by 

a{v,u) = I v{x) ■u{x)dx. (22) 
Jo. 

On K, the basis functions are given by V'f" = ^K{^i) and computing the element 
tensor , we obtain 

Af=f V^(x)-Vf(x)d:. 

JK 

where = J^^^ [ai]*,, dX and = dctV;, dZ, aZ, ' sec that, like 
for Poisson, the reference tensor A" is rank four and the geometry tensor Gk is 
rank two. 

3.4.2 Curvilinear elements. Our techniques may be generalized to cases in which 
the Jacobian varies spatially within elements, such as when curvilinear elements or 
general quadrilaterals or hexahedra are used. In this case, we can replace integration 
with summation over quadrature points and obtain a formulation based on tensor 
contraction, albeit with a higher run-time complexity than for afHne elements. 
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To illustrate this, we consider the case of the very simple bilinear form 




If the mapping from the reference element Kq to an element K is curvilinear, 
then we will be unable to pull the Jacobian and derivatives out of the integral. 
However, we will still be able to phrase a run-time tensor contraction with an extra 
index for the quadrature points. The element matrix for (24) is 




and changing coordinates we obtain 

Approximating the integral by quadrature, we let {-^fe}^i be a set of quadrature 
points on the reference element Kq with {wk}^^i the corresponding weights. We 
thus obtain the representation 




Note that A° may be computed entirely at compile-time, as can an expression 
for Ga, whereas the values of Ga depend on the geometry given at run-time. The 
tensors to be contracted at runtime have one extra dimension compared to a situa- 
tion where the mapping is affine. Still, this computation (once the geometry tensor 
is computed on an element) is readily phrased as a matrix- vector product. Hence, 
we have given an example of how the more traditional way of expressing quadrature 
and our precomputation are both instances of a high-rank tensor contraction. One 
could interpret this formulation as saying that quadrature (expressed as we have 
here) gives a general model for computation and that precomputation is possible 
as a compile-time optimization in the case where the mapping is affine. 

An open interesting problem would be to study under what conditions one could 
specialize a system such as FFC to use bases with tensor-product decompositions 
(available on unstructured as well as structured shapes [Karniadakis and Sherwin 
1999]) and automatically generate efficient matrix- vector products as are manually 
implemented in spectral element methods. 

3.4.3 Nonlinear forms. For a nonlinear variational problem, 

a{v,u) = L{v) VveV, (30) 
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with a nonlinear in m, we can solve by direct fixed-point iteration on the unknown 
u as discussed above for test case 3, or we can compute the (Frechet) derivative 
of the nonlinear form a and solve by Newton's method. For multilinear forms, 
defining the nonlinear residual and constructing the Jacobian can be performed 
with the current capabilities of FFC. 

To solve the variational problem (30) by Newton's method, we differentiate with 
respect to u to obtain a variational problem for the increment 6u: 



As an example, consider the nonlinear Poisson's equation —c{u)Au = f with c{u) = 
u. For more general c, considering the projection of c{u) into the finite element space 
leads to a multilinear form. Differentiating the form with respect to u, we obtain 
the following variational problem: Find 6u &V such that 



5uWv ■ Vu + uWv ■ W5udx = / vfdx- uWv ■ Wudx Vv € V. (32) 



Our current capabilities would allow us to define two forms, one to evaluate the 
nonlinear residual and another to construct the Jacobian matrix. This is sufficient 
to set up a nonlinear solver. Obviously, extending FFC to symbolically differenti- 
ate the nonlinear form would be more satisfying. We remark that the code Sun- 
dance [Long 2003] currently has such capabilities. It should be possible to leverage 
such tools in the future to combine our precomputation techniques with automatic 
differentiation. 

3.5 Optimization 

We consider here three different kinds of optimization that could be built into FFC 
in the future. For one, the current code is generated entirely straightline as a 
sequence of arithmetic and assignment. It should be possible to store the tensor 
A'-' in a contiguous array. Moreover, each Gk may be considered as a tensor or 
flattened into a vector. In the latter case, the action of forming the element matrix 
for one element may be written as a matrix- vector multiply using the level 2 BLAS. 
Once this observation is made, it is straightforward to see that we could form several 
Gk vectors and make better use of cache by computing several element matrices 
at once by a matrix- matrix multiply and the level 3 BLAS. 

This corresponds to a coarse-grained optimization. In other work [Kirby et al. 
2005, SISC], [Kirby et al. 2005, BIT], we have seen that for many forms, the entries 
m are related in such ways that various entries of the element matrices may be 
formed in fewer operations. For example, if two entries of A" are close together in 
Hamming distance, then the contraction of one entry with Gk can be computed 
efficiently from the other. As our code for optimization, FErari, evolves, we will 
integrate it with FFC as an optimizing backend. It will be simple to compare the 
output of FErari to the best performance using the BLAS, and let FFC output the 
best of the two (which may be highly problem-dependent). 

Finally, optimizations that arise from the variational form itself will be fruitful 
to explore in the future. For example, it should be possible to detect when a 
variational form is symmetric within FFC, as this leads to fewer operations to form 
the associated matrix. Moreover, for forms over vector-valued elements that have a 



{a{v,u) - L{v)) Vv€V. 



(31) 






ACM Transactions on Mathematical Software, Vol. V, No. N, Month 20YY. 



A Compiler for Variational Forms • 15 



Cartesian product basis (each basis function has support in only one component), 
other kinds of optimizations are appropriate. For example, the viscosity operator 
for Navicr-Stokcs is the vector Laplacian, which can be written as a block-diagonal 
matrix with one axis for each spatial dimension. By "taking apart" the basis 
functions, we hope to uncover this block structure, which will lead to more efficient 
compilation and hopefully more efficient code. 

4. COMPLEXITY OF FORM EVALUATION 

We now compare the proposed algorithm based on tensor representation to the 
standard quadrature-based approach. As wc shall see, tensor representation can be 
much more efficient than quadrature for a wide range of forms. 

4.1 Basic assumptions and notation 

To analyze the complexity of form evaluation, we make the following simplifying 
assumptions: 

— the form is bilinear, i.e., r = \i\ = 2; 

— the form can be represented as one tensor product, i.e., Af = A-'^G^; 
— the basis functions are scalar; 

— integrals are computed exactly, i.e., the order of the quadrature rule must match 

the polynomial order of the integrand. 

We shall use the following notation: q is the polynomial order of the basis func- 
tions on every element, p is the total polynomial order of the integrand of the form, 

d is the dimension of $1, n is the dimension of the function space on an clement, and 
N is the number of quadrature points needed to integrate polynomials of degree p 
exactly. 

Furthermore, let n j be the number of functions appearing in the form. For test 
cases 1-4 above, n/ = in all cases except test case 3 (Navier-Stokes) where n/ = 1. 
We use njj to denote the number of differential operators. For test cases 1-4, we 
have nn = in case 1 (the mass matrix), riD = 2 in case 2 (Poisson), = 1 in 
case 3 (Navier Stokes), and nu = 2 in case 4 (linear elasticity). 

4.2 Complexity of tensor representation 

The element tensor has n? entries. The number of basis functions n for polyno- 
mials of degree qind dimensions is ~ g**. To compute each entry Af of the element 
tensor A^ using tensor representation, wc need to compute the tensor product be- 
tween A^, and Gk- The geometry tensor Gk has rank nf + no and the number 
entries of Gk is n"^ rf"^ . The cost for computing the entries of the element 
tensor A^ using tensor representation is thus 

Note that there is no run-time cost associated with computing the tensor represen- 
tation (8), since this is computed at compile-time. Also note that we have not taken 
into account any of the optimizations discussed in Section 3.5. These optimizations 
can in some cases significantly reduce the operation count. 
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4.3 Complexity of quadrature 

To compute each entry Af of the element tensor using quadrature, we need 
to evaluate an integrand of total order p dX N quadrature points. The numbc^r of 
quadrature points needed to integrate polynomials of order p exactly in d dimensions 
\s N ^ p'^. Since the form is bilinear with basis functions of order q, the total order 
is p = 2q -\- n fq — njj. It is difficult to estimate precisely the cost of evaluating the 
integrand at each quadrature point, but a reasonable estimate is nf + nnd+l. Note 
that we assume that all basis functions and their derivatives have been pretabulated 
at all quadrature; points on the refcircincc; element. 

We thus obtain the following estimate of the total cost for computing the 
entries of the element tensor A^ using quadrature: 



4.4 Tensor representation vs. Quadrature 

Comparing tensor representation with quadrature, the speedup of tensor represen- 
tation is 



We immediately note that there can be a significant speedup for n/ = 0, since 
Tx/ri^ is then independent of the polynomial degree q. In particular, we note that 
for the mass matrix {uf — no —i)) the speedup is Tq/Tt ~ (2^)'', and for Poisson's 
equation (n/ = 0, nn = 2) the speedup is Tq/Tt ~ (2g - 2Y{2d + As we 

shall see below, the speedup for test cases 1-4 is significant, even for q = Q. 

On the other hand, we note that quadrature may be more efficient if n/ is large. 
Also, if one takes into account that underintegration is possible (choosing a smaller 
N than given by the polynomial order jj), it is less clear which approach is most 
efficient in any given case. It is known [Ciarlet 1976] that second order elliptic 
variational problems using polynomials of degree q require an integration rule that 
is exact only on polynomials of degree 2(7 — 2 to ensure the proper convergence rate, 
regardless of the arity of the form. However, our overall flop count is lower for 
bilinear and likely trilinear forms, and at any rate, our code is simpler for compilers 
to optimize than quadrature loops. 

Ultimately, one may thus imagine an intelligent system that automatically makes 
the choice between tensor representation and quadrature in each specific situation. 

5. IMPLEMENTATION 

We now discuss a number of important aspects of the implementation of the form 
compiler FFC. We also write down the forms for the test cases discussed above in 
Section 3.3 in the language of the form compiler FFC. Basically, we can consider 
FFC's functionality broken into three phases. First, it takes an expression for 
a multilinear form and generates the tensor A^ . While doing this, it derives an 
expression for evaluation of the element tensor Gk from the affine mapping and the 
coefficients of the form. Finally, it generates code for evaluating Gk and contracting 
it with A^, and for constructing the local-to-global mapping. 



Tq ~ n''N{nf + nod + 1) - {q'^fp'^{nf + nod + 1) 
~ q^'^{2q + Ufq - nD^inf + nod + 1). 



(34) 



Tq/Tt 



{2q + Hfq - noYinf + nod + 1) 



(35) 
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5.1 Parsing of forms 

The form compiler FFC implements a domain-specific language (DSL) for varia- 
tional forms, using Python as the host language. A language of variational forms 
is obtained by overloading the appropriate operators, including addition +, sub- 
traction -, multiplication (*), and diflFerentiation .dx(-) for a hierarchy of classes 
corresponding to the algebra A discussed above in Section 3.2. FFC thus uses the 
built-in parser of Python to process variational forms. 

5.2 Generation of the tensor representation 

The basic elements of the algebra are objects of the class BasisFunction, repre- 
senting (derivatives of) basis functions of some given finite element space. Each 
BasisFunction is associated with a particular finite element space and different 
BasisFunctions may be associated with different finite element spaces. Products 
of scalars and (derivatives of) basis functions are represented by the class Product, 
and sums of such products are represented by the class Sum. In addition, we include 
a class Function, representing linear combinations of basis functions (coefiicients) . 
In the diagrams of Tables I and II, we summarize the basic unary and binary 
operators respectively implemented for the hierarchy of classes. 

Note that by declaring a common base class for BasisFunction, Product, Sum, 
and Function, some of the operations can be grouped together to simplify the 
implementation. As a result, most operators will directly yield a Sum. Also note 
that the algebra of Sums is closed under the operations listed above. 



op 



.dx() 



P 
S 



Table I. Unary operators and their results for the classes BasisFunction (B), Function (F), 
Product (P), and Sum (S). 



+/- 

B 
F 
P 

S 



s s s s 

s s s s 

s s s s 

s s s s 



B 
F 
P 

s 



P S P s 

s s s s 

P S P s 

s s s s 



Table II. Binary operators and their results for the classes BasisFunction (B), Function (F), 
Product (P), and Sum (S). 



By associating with each object one or more indices, implemented by the class 
Index, an object of type Sum automatically represents a tensor, and by differentiat- 
ing between different types of indices, an object of type Sum automatically encodes 
the tensor representation (8). FFC differentiates between four different types of 
indices: primary, secondary, auxiliary, and fixed. A primary index («) is associated 
with the multiindex of the element tensor , a secondary index (a) is associated 
with the multiindex of the geometry tensor Gk, and thus the secondary indices 
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indicate along which dimensions to compute the tensor product A°^G^. Auxil- 
iary indices are internal indices within the reference tensor or the geometry 
tensor Gk and must be repeated exactly twice; summation is performed over each 
auxiliary index P before the tensor product is computed by summation over sec- 
ondary indices a. Finally, a fixed index is a given constant index that cannot be 
evaluated. Fixed indices are used to represent for example a derivative in a fixed 
coordinate direction. 

Implicit in our algebra is a grammar for multilinear forms. We could explicitly 
write an EBNF grammar and use tools such as lex and yacc to create a compiler for 
a domain-specific language. However, by limiting ourselves to overloaded operators, 
we successfully embed our language as a high-level library in Python. 

To make this concrete, consider test case 2 of section 3.3, Poisson's equation. 
The tensor representation = A^^G% is given by 

JKa 9Xai dXa2 ' ^^g^ 

G^ = deti^;,^^. 

There arc here two primary indices {ii and 12), two secondary indices (ai and a2), 
and one auxiliary index 

5.3 Evaluation of integrals 

Once the tensor representation (8) has been generated, FFC computes all entries 
of the reference tensor(s) by quadrature on the reference element. The quadrature 
rule is automatically chosen to match the polynomial order of each integrand. FFC 
uses FIAT [Kirby 2004; 2005] as the finite element backend; FIAT generates the set 
of basis functions, the quadrature rule, and evaluates the basis functions and their 
derivatives at the quadrature points. 

Although FIAT supports many families of finite elements, the current version of 
FFC only supports general order continuous/discontinuous Lagrange finite elements 
and first order Crouzeix-Raviart finite elements on triangles and on tetrahedra (or 
any other finite element with nodes given by pointwise evaluation). Support for 
other families of finite elements will be added in future versions. 

Computing integrals is the most expensive step in the compilation of a form. The 
typical run-time (of the compiler) ranges between 0.1 and 30 seconds, depending 
on the type of form and finite element. 

5.4 Generation of code 

When a form has been parsed, the tensor representation has been generated, and 
all integrals computed, code is generated for evaluation of geometry tensors and 
tensor products. The form compiler FFC has been designed to allow for generation 
of code in multiple different languages. Code is generated according to a specific 
format (which is essentially a Python dictionary) that controls the output code 
being generated, see Figure 5. The current version of FFC supports four output 
formats: C++ code for DOLFIN [Hoffman et al. 2005; Hoffman and Logg 2002], 
code (for verification and presentation purposes), a raw format that just 

ACM Transactions on Mathematical Software, Vol. V, No. N, Month 20YY. 



A Compiler for Variational Forms • 19 



lists the values of the reference tensors, and the recently added ASE format [Balay 
et al. 2005a] for compilation of forms for the next generation of PETSc. FFC can 
be easily extended with new output formats, including for example Python, C, or 
Fortran. 



Format 



poisson . form 



Parser 




Compiler 


► 



Poisson . h 



. J...T.7.1. 
1 1 



FIAT 



FErari 



Fig. 5. Diagram of the components of the form compiler FFC. 



5.5 Input/output 

FFC can be used cither as a Python package or from the command-line. We here 
give a brief description of how FFC can be called from the command- line to generate 
C++ code for DOLFIN. To use FFC from the command-line, one specifies the form 
in a text file in a special language for variational forms, which is simply Python 
equipped with the hierarchy of classes and operators discussed above in Section 5.1. 
In Table III we give the complete code for the specification of test case 2, Poisson's 
equation. Note that FFC uses tensor-notation, and thus the summation over the 
index i is implicit. Also note that the integral over an element K is denoted by 
*dx. 



element = FiniteElement ( "Lagrange" , "tetrahedron", 3) 

V = BasisFunction(element) 
u = BasisFunction(element) 
f = FunctionCelement) 
i = Index () 

a = v.dx(i)*u.dx(i)*dx 
L = v*f*dx 



Tabic III. The complete code for specification of test case 2, Poisson's equation, with piecewise 
cubics on tetrahedra in the language of FFC. Alternatively, the form can be specified in terms of 
standard operators: a = dot(grad(v) , grad(u))*dx. 
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Assuming that the form has been specified in the file Poisson.f orm, the form 
can be compiled using the command ff c Poisson.f orm. This generates the C++ 
file Poisson.h to be included in a DOLFIN program. In Tabic IV, wc include 
part of the output generated by FFC with input given by the code from Table III. 
In addition to this code, FFC generates code for the mapping t{-, •) from local to 
global degrees of freedom for each finite clement space associated with the form. 
Note that the values of the 10 x 10 element tensor (in the case of cubics on 
triangles) are stored as one contiguous array (block), since this is what the linear 
algebra backend of DOLFIN (PETSc) requires for assembly. 

void eval (double block [], const AffineMapSt map) const 
{ 

// Compute geometry tensors 

double G0_0_0 = map.det*(map.gOO*map.gOO + map.g01*map.g01) ; 
double G0_0_1 = map.det*(map.gOO*map.glO + map.g01*map.gll) ; 
double G0_1_0 = map.det*(map.glO*map.gOO + map.gll*map.g01) ; 
double G0_1_1 = map.det*(map.glO*map.glO + map.gll*map.gll) ; 

// Compute element tensor 

block [0] = 4.249999999999996e-01*G0_0_0 + 4.249999999999995e-01*G0_0_l + 

4.249999999999995e-01*G0_l_0 + 4.249999999999995e-01*G0_l_l; 

block [1] = -8.749999999999993e-02*G0_0_0 - 8.749999999999995e-02*G0_0_l; 

block [2] = -8.750000000000005e-02*G0_l_0 - 8.750000000000013e-02*G0_l_l; 

block [99] = 4.049999999999997e+00*G0_0_0 + 2 . 024999999999998e+00*G0_0_l + 
2.024999999999998e+00*G0_l_0 + 4.049999999999995e+00*G0_l_l; 

} 

Table IV. Part of the code generated by FFC for the input code from Table III. 



5.6 Completing the toolchain 

With the FEniCS project [Hoffman et al. 2005] , we have the beginnings of a working 
system realizing (in part) the Automation of Computational Mathematical Model- 
ing, and the form compiler FFC is just one of several components needed to com- 
plete the toolchain. FIAT automates the generation of finite element spaces and 
FFC automates the evaluation of variational forms. Furthermore, PETSc [Balay 
et al. 2005b; Balay ct al. 2004; Balay ct al. 1997], automating the solution of dis- 
crete systems, is used as the solver backend of FEniCS. A common C++ interface 
to the different FEniCS components is provided by DOLFIN. 

A complete automation of CMM, as outlined in [Logg 2004], is a major task and 
we hope that by a modular approach we can contribute to this automation. 

6. BENCHMARK RESULTS 

As noted above, the speedup for the code generated by the form compiler FFC can 
in many cases be significant. Below, we present a comparison with the standard 
quadrature-based approach for the test cases discussed above in Section 3.3. 
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The forms were compiled for a range of polynomial degrees using FFC ver- 
sion 0.1.6. This version of FFC does not take into account any of the optimizations 
discussed in Section 3.5, other than not generating code for multiplication with zero 
entries of the reference tensor. 

For the quadrature-based code, all basis functions and their derivatives were 
prctabulatcd at the quadrature points using FIAT. Loops for all scalar products 
were completely unrolled. 

In all cases, we have used the "coUapsed-coordinate" Gauss- Jacobi rules described 
by Karniadakis and Sherwin [Karniadakis and Sherwin 1999]. These take tensor- 
product Gaussian integration rules over the square and cube and map them to the 
reference simplex. These rules are not the best known (see for example [Dunavant 
1985]), but they are fairly simple to generate for arbitrary degree. Eventually, these 
rules will be integrated with FIAT, but even if we reduce the number of quadrature 
points by a factor of five, FFC still outperforms quadrature. 

The codes were compiled with gcc (g++) version 3.3.6 and the benchmark results 
presented below were obtained on an Intel Pentium 4 (CPU 3.0 GHz, 2GB RAM) 
running Debian GNU/Linux. The times reported are for the computation of each 
entry of the clement tensor on one million elements (scaled) . The total time can be 
obtained by multiplying with n^, the number of entries of the element tensor. The 
complete source-code for the benchmarks can be obtained from the FEniCS web 
site [Hoffman et al. 2005]. 

6.1 Summary of results 

In Table V, we summarize the results for test cases 1-4. In all cases, the speedup 
Tq/Tt is significant, ranging between a factor 10 1500. 

From Section 4, wo know that the speedup for the mass matrix should grow as 
g"^, but from Table V it is clear that the speedup is not quadratic for d = 2 and for 
d = 3, an optimum seems to be reached around q ~ 8. 

The reason that the predicted specdups arc not obtained in practice is that the 
complexity estimates presented in Section 4 only accoimt for the mimbor of floating- 
point operations. When the polynomial degree q grows, the number of lines of code 
generated by the form compiler grows. FFC unrolls all loops and generates one line 
of code for each entry of the element tensor to be computed. For a bilinear form, the 
number of entries is n? ~ q^"^. With q — 8, the number of lines of code generated is 
about 25, 000 for the mass matrix and Poisson in 3D, see Figure 6. As the number 
of lines of code grows, memory access becomes more important and dominates the 
run-time. Using BLAS to compute tensor products as discussed above might lead 
to more efficient memory traffic. 

Note however that although the optimal speedup is not obtained, the speedup is 
in all cases significant, even at g = 1. 

6.2 Results for test cases 

In Figures 7-10, we present the results for test cases 1-4 discussed above in Section 
3.3. In connection to each of the results, we include the specification of the form in 
the language used by the form compiler FFC. Because of limitations in the current 
implementation of FFC, the comparison is made for polynomial order g < 8 in test 
cases 1-2 and g < 4 in test cases 3-4. Higher polynomial order is possible but is 
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Lines of code generated 



A * * * Mass matrix 2D 
»»" Mass matrix 3D 
M-** Poisson 2D 
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Fig. 6. Lines of code generated by the form compiler FFC as function of the polynomial degree q. 
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Table V. Speedups Tq/Tt for test cases 1-4 in 2D and 3D. 



very costly to compile (compare Figure 6). 

7. CONCLUDING REMARKS AND FUTURE DIRECTIONS 

We have demonstrated a proof-of-concept form compiler that for a wide range of 
variational forms can generate code that gives significant speedups compared to the 
standard quadrature-based approach. 

The form compiler FFC is still in its early stages of development but is already in 
production use. A number of basic modules based on FFC have been implemented 
in DOLFIN and others are currently being developed (Navier-Stokes and updated 
elasticity). This will serve as a test bed for future development of FFC. 

Future plans for FFC include adding support for integrals over the boundary 
(adding the operator *ds to the language), support for automatic differentiation of 
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Fig. 7. Benchmark results for test case 1, the mass matrix, specified in FFC by a = v*u*dx. 
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Fig. 8. Benchmark results for test case 2, Poisson's equation, specified in FFC by a = 
V . dx ( i ) *u . dx ( i ) *dx. 
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Fig. 9. Benchmark results for test case 3, the nonhnear term of the incompressible Navier-Stokes 
equations, specified in FFC by a = v [i] *w[j] *u [i] .dx(j)*dx. 




Fig. 10. Benchmark results for test case 4, the strain-strain term of linear elasticity, specified in 
FFC by a = 0.25*(v[i] .dx(j) + v[j] .dx(i)) * (u[i] .dx(j) + u[j] .dx(i)) * dx. 
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nonlinear forms and automatic generation of dual problems and a posteriori error 
estimators [Eriksson et al. 1995; Becker and Rannacher 2001], optimization through 
FErari [Kirby ct al. 2005, SISC], [Kirby et al. 2005, BIT], adding support for new 
families of finite elements, including elements that require non-affine mappings from 
the reference element. In addition to general order continuous/discontinuous La- 
grange finite elements and Crouzeix-Raviart [Crouzeix and Raviart 1973] finite 
elements, the plan is to add support for Raviart-Thomas [Raviart and Thomas 
1977], Nedelec [Nedelec 1980], Brezzi-Douglas-Marini [Brezzi et al. 1985], Brezzi- 
Douglas-Fortin-Marini [Brezzi and Fortin 1991], Taylor-Hood [Boffi 1997; Brenner 
and Scott 1994], and Arnold- Winther [Arnold and Winther 2002] elements. 

Furthermore, the fact that Python is an interpreted language does impose a 
penalty on the performance of the compiler (but not on the generated code) . How- 
ever, this can be overcome by more extensive use of BLAS in numerically intensive 
parts of the compiler, such as the precomputation of integrals on the reference ele- 
ment. We also plan to investigate the use of BLAS for evaluation of tensor products 
as an alternative to generating explicit unrolled code. Other topics of interest in- 
clude automatic verification of the correctness of the code generated by the form 
compiler [Kirby et al. 2005, Verification]. 
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